Forum:Woot dev offers some guidance
Woot dev I'm the lead developer at Woot. We just banned a number of Woot-Off checkers for hitting our site too much. We're totally not opposed to third-party checkers, just ones that hit our servers 10,000 times an hour. We update our home page about every thirty seconds so there's not much reason for anyone else to hit our home page more often than that. Woot-Off checker sites need to hit our site (preferably http://www.woot.com/salerss.aspx which has plenty of Woot-Off info in it and it's smaller than the home page) once, cache the data, serve up the cached data to its users for 30 seconds. --Lukeduff 15:58, 1 February 2007 (UTC) :can you send me the list of checkers to remove from the list we maintain here which have been banned?--ikishk 16:07, 1 February 2007 (UTC) :: We banned them by IP number and they're often shared hosting servers, so it's hard to tell which trackers they were. You could match up IPs to hosts but it'd probably be easier to just open each one in the list and see if it's busted. --Lukeduff 16:19, 1 February 2007 (UTC) :::yup, thanks anyways. Whats your view on the non-image caching discussion right below this one? :fyi I've started to work on an "api" of sorts for using that xml feed instead of the main site (thank you Lukeduff!). I'll try to make it as generalized as possible so then it can be integrated into any of the mods without much fuss hopefully--Vrillusions 17:03, 1 February 2007 (UTC) ::If there are things missing from the the salerss.aspx page that people need, I'll add them. We're never going to give away remaining quantity though. Folks can email me at luke@woot.com. --Lukeduff 18:20, 1 February 2007 (UTC) :::Its missing %, unless im blind. Qty is in the forums after the sale, no point in ruining the fun knowing ahead of time :)--ikishk 19:51, 1 February 2007 (UTC) :::luke, whats the deal with lilwoot.ashx? do you want people using that, which hotlinks to woot servers? just need your opinion on those so "we" can be fair on the list. most other wootcheckers cache the images, seems these dont. --ikishk 02:51, 2 February 2007 (UTC) ::::I would say that the lilwoot.ashx 'woot checkers' are not woot checkers at all. It is the same as going to the main woot.com page and viewing the product from there. The data gets pulled from the woot.com servers just like if you visited woot.com. Caching the images and data at a reasonable rate is the only way a woot checker will help. 198.169.188.225 15:23, 2 February 2007 (UTC) :::::I agree, thats why theres all this comotion(sp?), however, who is "198.169.188.225" :)--ikishk 17:37, 2 February 2007 (UTC) :::Well, my question is how reliable is the salesrss feed? I monitered it durring the december wootoff, and noticed that for the first 12 hours, there was simply a 500 error. That is the only reason I havn't been using it so far... What's woot's official stance on keeping that feed active? Thank you! --ircmaxell10:06, 3 February 2007 (UTC) ::::My official stance is "sorry about that, I'll do better." The salerss.aspx page has been mainly unofficial until now. If I can get people using it, I'll make sure it's up 100% of the time. I need to rewrite it anyway. A couple days before this last Woot-Off, I completely re-did the code for the home page to read off an xml file pushed to the web servers instead of data from the database (partly why the last Woot-Off ran so smoothly). I'm going to update salerss.aspx to read from the same file. ::::Once updated, salerss.aspx's output cache will be (like the home page) invalidated whenever the xml file changes. The file gets written out whenever a new sale launches, the current sale sells out, and every thirty seconds (to keep the stats udpdated).--Lukeduff 02:39, 4 February 2007 (UTC) :::::Well, that's good enough for me. I'll start coding to pull off of the rss feed instead of the main page. Thanks! --ircmaxell 12:44, 4 February 2007 (UTC) :Please read Suggestion to Woot for a suggestion. -- S. Gartner talk 22:50, 3 February 2007 (UTC) :Lukeduff, would you be willing to outline all the XML tags you'll be using for salerss.aspx? A full list of tags including woot-off only tags would save us time. Otherwise we would only be able to parse salerss.aspx once we've seen it during a woot-off. This means I can only code my woot-checker during a woot-off which is a huge delay. -- Darkstar 03:42, 5 February 2007 (UTC) ::Darkstar, you can use the rss feed up there. The only thing that changes durring a wootoff, is the wootoff tag changes to True (if I remember correctly)... The xml parser up top is a pretty good start. The hard part for me, was to code the rss and front page to pull the same data. The way mine works, is download the salerss feed, and if any other response code but 200 will initiate a download of default.aspx. I have two different parsers written. They both have been tested and appear to be working good! And the xml parser has an added benefit, it's fast! -- ircmaxell 12:10, 5 February 2007 (UTC) :::sold= was stuck on 0 the whole wootoff last time. :During the wootoff I grabbed the xml file a few times. Here's the link salerss_examples.zip (hosted on box.net since I don't think we can upload .zip files here). I also added a sample non woot-off one to show the difference. --Vrillusions 15:42, 6 February 2007 (UTC) ::Awesome!!! This made me realize an error I had before the next wootoff!!! Thanks!!! --ircmaxell 16:13, 6 February 2007 (UTC) ::Important I was starting to work on a woot cache that used the xml file and noticed there are a bunch of field name changes. Basically all the woot-specific tags have a "woot:" prefix. So "price" is now "woot:price", "condition" is "woot:condition", etc. Looks like they did that to make the xml more standardized. --Vrillusions 20:46, 8 March 2007 (UTC) ::: As a follow-up, using Javascript (or AJAX) to parse the XML will require you to call the XML tags by two different names depending on the browser. Firefox and similar browsers (opera, netscape...) will only be able to read the tag by its name without the namespace (e.g. price) whereas IE will only be able to read the tag by its full name including the namespace (e.g. woot:price). Something to keep in mind when coding. - Darkstar 21:35, 8 March 2007 (UTC) I've pushed out the update for SaleRss.aspx. The perfomance should be good and the data is as up-to-date as the home page. The only tag change is moving soldoutpercentage to its own tag. During Woot-Off sales the value is 0 to 1 giving the percentage of inventory sold. During non-Woot-Off sales, the value is limited to 0, .9 or 1. At .9, we do the bouncy I Want One button. --Lukeduff 21:25, 12 February 2007 (UTC) :I believe I've fully integrated the existing features from my wootchecker into the new XML parsing version. It also has a fallback to the index if salerss.aspx happens to die (either ircmaxell or ikishk's idea). Only one way to test it out (hopefully not all too soon, though. My bandwidth usage is still recovering from the last woot-off). -- Darkstar 05:07, 13 February 2007 (UTC) ::Well, I thank you for your continued development and working with us... I do have one request though. Could you put the item quantity (# of woots) in the feed when the item sells out? I've been pulling it from the main page for months, but with this system, I can't without defaulting to the main page... Thanks! --ircmaxell 04:18, 14 February 2007 (UTC) :::Wouldn't that only work for non-woot-offs? You wouldn't have enough time to fetch it for a sold out item unless you scripted something to check every 5-10 seconds once an item sells out. -- Darkstar 13:39, 14 February 2007 (UTC) :::: Well, my script which updates every 30 seconds via separate caching script, catches about 85% of the quantity's during a wootoff (which is fine for me)... --ircmaxell 13:51, 14 February 2007 (UTC) ::::We're supposed to have something around a 2 minute delay between sales but that doesn't always seem to work. I could add the quantity in anyway though. I'm going to work on getting the same tags on the full rss feed, so you could always pull previous sales up that way. --Lukeduff 15:56, 14 February 2007 (UTC) :::::Just out of curiosity, whats the status on the quantity tag? (not to sound pushy, but updating the quantity on each item is getting to be a PITA). I store quantity for statistics calculations. --ircmaxell 04:50, 1 March 2007 (UTC) :Just as an FYI, I've just released the second version (0.2) of Woot Watcher to the FireFox extensions folks, it's waiting in the approval queue. The big changes in this version is that I've moved it from being a stand-alone toolbar to living in the status bar, to save real-estate. It also uses the RSS feed now rather than the main page, to reduce load on the poor, sweaty Woot servers. You can download it from http://www.blackbear.com/wootwatcher/ until it gets approved -- Blackbear 07:00, 19 February 2007 (UTC) Proof of concept See Forum:Using_woots_xml_page for the code hotlinking woot.com QuickWoot - http://woot.xxxx.xx/ - written from scratch for improved speed - oh, I should mention, uses the black2d CSS design, but the code has been re-written - 15 second cache : The point of a woot checker is to relieve stress on the woot servers. your "faster" version hotlinks to woot.com. Do you even have a local cache, of does it fetch woot data on every hit? IM not gonna bother adding it to the list.--ikishk 03:08, 1 February 2007 (UTC) :: Hotlinks? How so? It pulls the images for the product from the woot.com servers, but so do all versions based on the black2d style. Normally a black2d pulls a JavaScript file direct from the woot servers, which then pulls those images direct from the woot servers (causing each person to load the tracker to first download the JavaScript file, then download all of the images from woot). This version causes less load on the woot servers than a normal black2d version (due to the fact that loading the JavaScript file is no longer required), of which I see multiple in the current list. ::: Oh, and perhaps this will make it a little clearer: There are three parts that a standard checker will grab: the woot page to get the %, the JavaScript file on the woot servers to get the product images, and the product images themselves. A normal black2d caches the woot page and %, but not the JavaScript or the product images. This versions caches the woot page and %, and the JavaScript, but not the product images. Thus it is easier on the woot servers than a standard black2d, which you consider to be good. :::: pulling text is less bandwidth than images. most checker cache at least the image. some even cache the text for a certain % or requests. I'm not trying to be an elitist f*uck, im trying to tell you your tracker doesnt help woot. cache the image and yer all good.--ikishk 15:52, 1 February 2007 (UTC) :::::True, pulling text is less bandwidth... However, bandwidth is not the concern here (I believe), it's CPU cycles. Pulling an image takes no time and almost zero CPU time or Memory. Pulling an ASP file takes significant CPU power and Memory... --ircmaxell 13:26, 15 February 2007 (UTC) I'm interested to know why http://woot.xxx.be was so unceremoniously deleted without any response to the very clear explanation as to the concerns about it's internal workings. Interested in running a good wiki, or interested in promoting yourselves? :nice try. I responded, and even left it on this page awhile, than I moved it to the forum: Forum:Apache_logs_w/_checkers IF we where self promoting ourselves, why are the checkers we run not the first you see on the lists? We are trying to be consistant and remain true to the whole idea of woot checkers, while you are hotlinking off woot. Cache the images localy, then its perfect.--ikishk 01:44, 2 February 2007 (UTC) ::Fine, please delete all black2d trackers from your list. They hotlink to woot.com and offend me. Look at the source code to them. You will find I am right. Hotlinks right to a JavaScript. :::Very interesting, maybe Luke can weigh in on that one: http://www.woot.com/lilwoot.ashx Leaching images actually isn't as big of a problem for Woot as people hitting our application pages too much is. Unlike a news site or photo site, we don't have a lot of images to leach in the first place and people's browsers will cache the image after the first load anyway. Our bandwidth usage for ASPX is much higher than JPG/GIF/CSS. Also, we're currently looking at proposals from places like Akamai to host our media, so the load on our servers won't really be an issue. We'll certainly crack down on heavy image leaching down the road if we think it's a problem.--Lukeduff 02:46, 4 February 2007 (UTC) :Excellent clarification. I'll add the above checker to the list.--ikishk 20:15, 4 February 2007 (UTC)